Merging data frames

Problem

You want to merge two data frames on a given column from each (like a join in SQL).

Solution

# Make a data frame mapping story numbers to titles
stories <- read.table(header=T, con <- textConnection('
   storyid  title
    1       lions
    2      tigers
    3       bears
    '))
close(con)

# Make another data frame with the data and story numbers (no titles)
data <- read.table(header=T, con <- textConnection('
    subject storyid rating
          1       1    6.7
          1       2    4.5
          1       3    3.7
          2       2    3.3
          2       3    4.1
          2       1    5.2
    '))
close(con)

# Merge the two data frames
merge(stories, data, "storyid")
#  storyid subject rating  title
#       1       1    6.7  lions
#       1       2    5.2  lions
#       2       1    4.5 tigers
#       2       2    3.3 tigers
#       3       1    3.7  bears
#       3       2    4.1  bears

If the two data frames have different names for the columns you want to match on, the names can be specified:

# In this case, the column is named 'id' instead of storyid
stories2 <- read.table(header=T, con <- textConnection('
   id       title
    1       lions
    2      tigers
    3       bears
    '))
close(con)

# Merge on stories2$id and data$storyid.
merge(x=stories2, y=data, by.x="id", by.y="storyid")
# id  title subject rating
#  1  lions       1    6.7
#  1  lions       2    5.2
#  2 tigers       1    4.5
#  2 tigers       2    3.3
#  3  bears       1    3.7
#  3  bears       2    4.1

# Note that the column name is inherited from the first data frame (x=stories2).

It is possible to merge on multiple columns:

# Make up more data
animals <- read.table(header=T, con <- textConnection('
   size type         name
  small  cat         lynx
    big  cat        tiger
  small  dog    chihuahua
    big  dog "great dane"
'))
close(con)

observations <- read.table(header=T, con <- textConnection('
   number  size type
        1   big  cat
        2 small  dog
        3 small  dog
        4   big  dog
'))
close(con)

merge(observations, animals, c("size","type"))
#  size type number       name
#   big  cat      1      tiger
#   big  dog      4 great dane
# small  dog      2  chihuahua
# small  dog      3  chihuahua

Notes

After merging, it may be useful to change the order of the columns. See ../Reordering the columns in a data frame.